Using Kart and GitHub for versioning and collaborating with spatial data in archaeological research

Archeo.FOSS 17 (Turin, 12-13 December)

University of Turin

University of Turin

Talk overview

  • Introduction
    • Open Science and version control in archaeology
    • Git and limitation
  • Git for geospatial data
    • Description and features of Kart
  • Practical applications of kart in archaeology
    • Description of the project
    • How we are using kart
  • Thoughts and conclusions

Scan and follow the presentation on your phone!

Introduction

1 2 3 4

Open Science and transparency of the process

1 2 3 4

  • One (of many) aim Open Science: opening and transparency of process behind data creation and results
  • “Data must have history” Strupler and Wilkinson (2017)

Wallis (2022)

Version control

1 2 3 4

  • Transparent process trough “snapshots” at different stages (roll-back if necessary)
  • Provides a solution to the multiple iterations of correction and renaming of the same file
  • Greater accountability and better documentation (Kansa 2012)
  • Enhances Open Science practices (Marwick 2017)

Source: xkcd

Git

1 2 3 4

  • Distributed version control system
  • Originally developed to track changes in the linux kernel
  • Adapted also to non-programming applications
  • Git is still not a user friendly software
  • Graphical frontends do not always help

Source: xkcd

Distributed version control and archaeology

1 2 3 4

  • Archaeology has come a long way in adopting version control
  • Applied mainly in the programming/scripting applications and publication
  • Some attempts to adapt it to fieldwork practices


Source: Strupler and Wilkinson (2017: 5)

Source: Strupler and Wilkinson (2017: 4)

Git and binary files

1 2 3 4

  • Binary files: images, word documents, excel files
  • Git is not as efficent with binary files as it is with plain text (save the entire file every time)
  • Storage issues, harder to track changes
  • For text files, plain text can sometimes be the answer, but what about GIS and relational databases?

Example diff of plain text file with additions visibile in green

Example diff of binary file, no change visible

What about geospatial data?

1 2 3 4

  • In GIS, research process is often obscured by the point-and-click nature of the GUI
  • QGIS models can surely help reproducibility of some analyses
  • Scripts for data cleaning

For many in archaeology, for whom using GIS to visualise results is essentially a graphical-based point and-click process, advocating a return to code may seem like a backward step. We understand the arguments for usability, and acknowledge that intermediate tools which can bridge point-and-click with code-based approaches are desperately required.

Strupler and Wilkinson (2017)

Git for geospatial data

1 2 3 4

Git for distributed version control of geospatial data

1 2 3 4


Kart features

1 2 3 4

  • Works with different file formats: Geopackage, PostgreSQL/PostGIS, MySQL, MSSQLS
  • Support most geospatial data types: Vectors, Raster, Point Clouds, Lidar, etc.
  • Planned support for shapefiles
  • “Built on git, works like git”

Kart features

1 2 3 4

  • Track changes at the row and cell layer level
  • Command Line Interface tool
  • Standard git workflow
    • kart status
    • kart add
    • kart commit
    • kart pull
    • kart push
    • kart log
    • kart switch/branch
  • Scriptable

Kart QGIS Plugin

1 2 3 4

  • QGIS plugin offers a Graphical User Interface
  • All the kart commands are available
  • Visual tool to inspect changes

Remote Collaboration

1 2 3 4

  • Host data in remote repositories
  • Compatible with all qgis styles
  • Potential to mitigate common issues with data sharing

Kart for archaeology

1 2 3 4

Project presentation

1 2 3 4

Project presentation

1 2 3 4

Dataset

1 2 3 4

  • Still under digitalization
  • 2065 Sites collected so far
  • 5684 Occupation phases

Dataset organization

1 2 3 4

  • QGIS attribute table
  • QGIS form
    • General
    • Archaeological
    • Geospatial
    • References
  • Background tables
  • All versioned in kart

Project structure

1 2 3 4

  • Organization on GitHub
  • Project actions treated as GitHub issues
  • Different repositories depending on data
  • Granular control of licenses, publications, repo access

Using Kart in our project

1 2 3 4

  • Relatively simple workflow
  • Two main uses
  • Collaboration between project members
    • Simple git workflow
    • Different branches for each person, pushing and merging to main
  • Keeping track of dataset change
    • Transparency of the process
    • File (and methods) history
    • Inspect beyond the final product

Using Kart in our project - issues

1 2 3 4

  • Not many issues until now (few people)
  • Collaboration tested on two MacOS (13-Ventura and 12-Monterey), issues with MacOS 11-Big Sur
  • Kart tested also on Ubuntu-based Linux (Pop!_OS)
  • Conflicts with primary keys when working with Geopackages

Using Kart in our project

1 2 3 4

  • Public project wiki
  • How to use the dataset and how to use kart
  • Tips to solve common issues
  • Methodology and convetions
  • Internal use and external reference
  • Updated as the project proceed

Conclusions

1 2 3 4

Conclusions

1 2 3 4

Advantages

  • Git-based tool + Graphical solution for those unfamiliar with git
  • Fieldwork (no internet connection needed unless you push changes to remote)
  • Kart can fit well into archaeological Open Science practices
  • More transparency both during and after data creation process
  • Lack single file to download from online repositories1 (site stewardship)

Disadvantages

  • Not an easily accessible tool
  • Graphical interface still need more work
  • Solving primary key conflicts requires the command-line
  • Documentation is still catching up with recent development
    • Contribution to upstream from our wiki

Thank you!


Andrea Titolo (andrea.titolo@unito.it) - andreatitolo@archaeo.social

Alessio Palmisano (alessio.palmisano@unito.it) - AlePalmi82


)
Interactive Presentation


https://zenodo.org/doi/10.5281/zenodo.10369518

Slides Source Code - CC BY-SA-4.0

Works Cited

Bar, S. and Zertal, A. (2021). The Manasseh Hill Country Survey Volume 6: The Eastern Samaria Shoulder, from Nahal Tirzah (Wadi Far’ah) to Maale Ephraim Junction, Brill.
Bar, S. and Zertal, A. (2022). The Manasseh Hill Country Survey Volume 7: The South-Eastern Samaria Shoulder, from Wadi Rashash to WadiAujah, Brill.
Coup, R. (2022a). Kart: An introduction to practical data versioning for geospatial.
Coup, R. (2022b). Kart: A Practical Tool for Versioning Geospatial Data.
Coup, R. (2023). 2023 QGIS Data Versioning with Kart - Robert Coup.
Finkelstein, I., Lederman, Z. and Bunimovitz, S. (1997). Highlands of many cultures: The Southern Samaria survey ; the sites, Institute of Archaeology of Tel-Aviv University, Publications Section.
Kansa, E. (2012). Openness and Archaeology’s Information Ecosystem. World Archaeology 44: 498–520.
KartContributors (2023). Kart geospatial data version-control software.
Kloner, A. (2000). Survey of Jerusalem The Southern Sectors.
Marwick, B. (2017). Computational Reproducibility in Archaeological Research: Basic Principles and a Case Study of Their Implementation. Journal of Archaeological Method and Theory 24: 424–450.
Olaya, V. (2022). Spatial data versioning with the Kart QGIS Plugin with Victor Olaya.
Strupler, N. and Wilkinson, T. C. (2017). Reproducibility in the field: Transparency, version control and collaboration on the project panormos survey.
Wallis, K. (2022). Open Science: A practical guide for PhD students, University College London.
Zertal, A. (2004). The Manasseh Hill Country Survey, Volume 1: The Shechem Syncline, Brill.
Zertal, A. (2007). The Manasseh Hill Country Survey, Volume 2: The Eastern Valleys and the Fringes of the Desert, Brill.
Zertal, A. and Bar, S. (2017). The Manasseh Hill Country Survey Volume 4: From Nahal Bezeq to the Sartaba, Brill.
Zertal, A. and Bar, S. (2019). The Manasseh Hill Country Survey Volume 5: The Middle Jordan Valley, from Wadi Fasael to Wadi Aujah, Brill.
Zertal, A. and Mirkam, N. (2016). The Manasseh Hill Country Survey: Volume 3: From Nahal Iron to Nahal Shechem, Brill.